Raw fastq data files were quality checked using FastQC v0.11.9 and a report was generated using MultiQC v1.8. Fastq files were aligned to GRCh38 using BWA 0.7.17. Duplicates were marked using GATK MarkDuplicates, and bases recalibrated using GATK BaseRecalibrator and GATK ApplyBQSR (GATK v4.1.7.0). Somatic variants were then called using GATK Mutect2 software (GATK v4.1.7.0). The variants were annotated using Variant Effect Predictor (VEP v99.2) and converted to MAF files using vcf2maf (vcf2maf v1.6.21). All of these steps were completed on Nextflow Tower running the standardized nf-core pipeline sarek v2.7.1.
Somatic calls from Mutect2 were visualized in the all the samples without any prefilter steps. The samples include tumor samples from 3 different sequencing batches.
Below is a summary of all unfiltered variants identified in the samples :
Figure 1
Figure 2
The somatic variant calls were then filtered to exclude common variants and variants with potentially low or medium deleterious consequence. The first oncoplot shows top 500 genes with mutations in PNF and MPNST samples. The second oncoplot shows mutations for genes of interest in PNF and MPNST samples.
Now we specifically choose the patients who provided samples for normal, benign, and malignant tissue. These set of samples are called “TRIADS”. The patients with triad samples are: “JH-2-002”, “JH-2-015”, “JH-2-016”, “JH-2-023”, “JH-2-031”, “JH-2-045”, “JH-2-055”, “JH-2-084”.